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In this paper, we introduce new parametric and semiparametric regression tech- 
niques for a recurrent event process subject to random right censoring. We develop 
pq i models for the cumulative mean function and provide asymptotically normal esti- 

^ I mators. Our semiparametric model which relies on a single-index assumption can 

ly-v I be seen as a reduction dimension technique that, contrary to a fully nonparametric 

•n ■ approach, is not stroke by the curse of dimensionality when the number of covariates 

• ■ is high. We discuss data-driven techniques to choose the parameters involved in the 

f— ^ ■ estimation procedures and provide a simulation study to support our theoretical 

^D . results. 
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1 Introduction 

The modeling of recurrent events has become a crucial issue in various application fields 
of statistical inference such as clinical and epidemiological studies, insurance or actuarial 
science. Among many examples, one can mention the modeling of asthma, of epileptic 
seizures or of repeated warranty claims. In these settings, regression models are a valuable 
tool for predicting or identifying the causes which infiuence the number of such events 
occurring during a given time period. A natural way to measure the impact of covariates 
on the recurrent event process consists of estimating the conditional cumulative mean 
function. In this paper, our aim consists of developing both parametric and semiparamet- 
ric inference for this conditional cumulative mean function. To that aim, we introduce 
new estimators and study their asymptotic behavior. We also discuss the data-driven way 
of calibrating the parameters involved in the estimation procedures. 

In the literature, various authors have studied Cox regression models adapted to the 
recurrent event context. For example, in the absence of dependent death. Prentice et 
al. ( I198ip considered Cox-type regression models which allow the intensity of the recur- 
rent event process to depend on the individual's prior failure history through stratifi- 
cation. Allowing for independent censoring and time-dependent covariates, Andersen & 
Gill fll982p carried out Cox-type regression analysis for the intensity of the recurrent pro- 
cess which is assumed to be a time-transformed Poisson process. Andersen et al. 



also adopted modeling techniques based on the intensity process in the presence of censor- 



ing under a non- homogeneous Markov assumption. Lin et al. fl2000p provided asymptotic 
distribution theory for the fitting of Cox-type marginal models without the Poisson as- 
sumption. Lawless & Nadeau (I1995P proposed a semiparametric regression model where 
the conditional cumulative mean function is proportional to an unknown baseline func- 
tion through a coefficient that depends parametrically on the covariates. More recently, 
Ghosh & Lin fl2003p performed semiparametric regression with a scale-change model that 
formulates the marginal distributions of the recurrent event process and death as two 
joint accelerated failure time models while leaving the dependence structure unspecified. 

The main advantage of these kinds of models stands in the simplicity of the regres- 
sion function. But they unfortunately face the disadvantage (with respect to a purely 
nonparametric approach) to rely on strong modeling assumptions that may not hold in 
practice. 

In this work, we first study a general parametric regression model for the recurrent 
event process. We then study a semiparametric generalization which relies on a single- 



index assumption. We propose a new procedure to estimate both the index and the 
conditional cumulative mean regression function and provide a detailed asymptotic study 
of the proposed estimators. This single-index model can be seen as a compromise between 
a parametric approach and a nonparametric one. In particular, while allowing full flexi- 
bility, the nonparametric approach is known to fail when the number of covariates is high 
(greater than 3 in practice) which is the so-called "curse-of-dimensionality". It turns out 
that single-index models rely on a dimension reduction assumption which allows to achieve 
better convergence rates and still ensures enough flexibility to be adapted to a large num- 
ber of practical cases. This model can also be seen as a generalization of Cox regression 
model. Compared to uncensored single-index models adapted to mean-regression, see e.g. 



Ichimura fll993|l . in the specific setting of recurrent events, the presence of censoring usu- 
ally deteriorates the quality of estimation in the tail of the distribution. Therefore, in our 
approach, we introduce a weight function designed to compensate the lack of information 
induced by censoring. The main novelty of our procedure stands in the fact that this 
weight function may be chosen using data-driven techniques. 

The paper is organized as follows. In Section [21 we define the parametric and semi- 
parametric models and explain the general methodology. Asymptotic results are presented 
in Section [3l Simulation studies are carried out in Section H] to investigate on the per- 
formance of our methods for finite sample size. Technical results are postponed to the 
Appendix in Section O 

2 Model assumptions and methodology 

In this section, we present the general setting. Specifically, Section 12.11 introduces the 



different regression models. Section 12.21 presents the estimation procedures. They are 
based on a least-squares type criterion and on a rescaled process defined in Section 12.2.11 
which permits to correct the impact of censoring. 

2.1 Regression models for recurrent events 

Consider the recurrent event process A^* (t) which denotes the number of recurrent events 
occurring in the time interval [0,t]. This process can be seen as a piecewise constant 
function with jump only on [0, D] where D can be random. In clinical applications, this 
time D may stand for the death time of a patient. For insurance applications, D can 
represent the warranty length (which can be random if the client has the possibility of 



breaking the contract) or the hfetime of the insured good. In this paper, we aim to infer 
on the cumulative conditional mean function given for t G [0, D] by 

li{t\z) = E[N*{t)\Z = z\, 

where Z G 2 C M'^ is a rf— dimensional vector of covariates. 

We now present the two different models for /i that are studied throughout this paper. 

Model 1 : parametric case. 

/i(tk) = /io(t,^;^o), (2.1) 

where ^o £ © C M'^ is unknown and /xq is a known function. 

Model 2 : semiparametric case. 

/i(tk)=M,(t,^[,2;), (2.2) 

where 6q E Q d W^, fJ^eitjU) = E[N*(t)\9'Z = u] and the family of functions J" = {/ig : 
6 G 0} is unknown. We impose that the first component of Oq is 1 to identify this pa- 
rameter. Another equivalent condition could consist of imposing that Oq is of norm 1 for 
some given norm on M.'^. 

The appealing feature of the first model stands in the simplicity of the regression 
function. However, like every parametric procedure, it relies on strong assumptions which 
have few chances to hold in practice. On the opposite, a fully nonparametric procedure 
requires fewer assumptions but suffers from the so-called "curse of dimensionality" when 
the number of covariates is high. Therefore, the second model appears as a good com- 
promise between the parametric approach and the nonparametric one. Indeed it is more 
flexible than a fully parametric one but is not stroke by the curse of dimensionality since 
it relies on a dimension reduction assumption. Moreover, model 2 can be seen as a gener- 
alization of widely studied models. For example, the models fi{t\z) = /io(t) exp(^Qz) and 
fi{t\z) = /io(texp(^Q2;)) where /iq is an unknown baseline function correspond respectively 
to the popular Cox regression model and to the accelerated failure time model and are 
covered by model 2 as special cases. 

One does not generally observe N* on the whole time interval [0, D] because the 
random variable D is subject to right-censoring. Let C be a positive random variable 



standing for the censoring time. The observation time T is then given hy T = D /\ C. 
Hence, instead of observing N*{t) for t G [0,D], one only observes N{t) = N*{t A T) 
for t > 0. Letting 6 = I{D < C), the observations consist of n i.i.d. rephcations 
(Tj, 6i, Zi, A^i(-))i<i<ri of (T, 6, Z, N{-)). Let us introduce the distribution functions of the 
observed variables in the censored data model: 

'H{t) = P{T<t), 
F{t) = P{D < t), 
G{t) = P{C < t). 

We also define Th = inf(t : H{t) = 1) the right endpoint of the support of the random 
variable T. In the sequel, we use some assumptions needed to identify these distribution 
functions. 

Assumption 1. Assume that 

P{dN*{C) ^ 0) = 0, 
P{D = C) = 0. 

This is a common assumption in the context of recurrent events which prevents us 
from ties between the occurrence times of death, censoring and recurrent events. 

Assumption 2. Assume that 

CAL{N%D), 

P{C < t\N% Z, D) = P{C < t\N*, D) for t E [0, th]. 

Assumption 2 holds in the particular case where C is independent of (A^*, D, Z) but is 
more general since it does not require the independence between C and Z. Similar kinds 
of assumptions are often considered in the literature on the Kaplan-Meier estimator for 
the survival distribution function, see e.g. Stute fll993p . 

2.2 Estimation procedure 

2.2.1 The rescaled process 

One of the difficulties we face when estimating the conditional expectation of A^* is that 
the process A^* is not directly observed. Hence, the most natural criteria we may like 



to use can not be computed since they rely on A^*. Therefore, we introduce a rescaled 
process Y designed to compensate the censoring effects. We define, for t G [0, D] 

In the definition (12. 3p . the denominator is decreasing when s grows to infinity. This means 
that we allow more weight to the events we observe when s is large. This compensates 
the lack of observations due to censoring for s large. Under Assumptions [1] and |2l we have 
for s e [0, D] 

E[dN{s)\Z] = E[dN*{s AC)\Z] 

= E[dN*{s)I{s<C)\Z] 

= E[dN*{s)\Z]{l-Gis-)), (2.4) 



so that 



E[Y{t)\Z] =E[N*{t)\Z]. 



However, the rescaled process Y can not be computed in practice since it relies on the 
distribution function G which is usually unknown. But the process Y{t) can be estimated, 
for t e [0, D] by 

Jo 1 - G[s-) 
where G denotes the Kaplan-Meier estimator of G given by 

G{t)=i- n 1 ^ 



i:T,<t ^ ^^.7 = 1 \ J — V 



2.2.2 The parametric case 



Going back to the definition of the conditional expectation, it is quite natural to perform 
estimation of 6q in the parametric model using minimization of a least-squares-type crite- 
rion. Once again, since A^* is unavailable, we consider a criterion based on the estimated 
rescaled process Y. 
Let w denote a measure such that w(^[0, oo)) < oo and define 

l>TH pTH 

M^(e,/io)= / E[^x,{t,Z-ef]dw{t)-2 E[Y{t)^i,{t,Z-e)\dw{t). 

Jo Jo 



By definition of tlie conditional expectation, tlie true parameter value 6*0 satisfies 

6'o = argminM^(6',/io)- (2.6) 

To estimate 6q, it is natural to replace the function M^ by an empirical version, that is 

Mn,n,{0, /io) = - V / /io(t, Zf, dfdw{t) - - V / ri(t)/io(t, Z' 9)dw{t), 

where T(„) is the greatest order statistics associated to the sample Ti, . . . ,T„. Then we 
define an estimator of 6q as 

e{w) = argminM„,^(^,/io). (2.7) 

eee 

In the above definition, we emphasize the fact that this estimator depends on the choice of 
the measure w. This measure w is an important feature of our procedure. First, in some 
situations, the statistician may wish to give more weight to some time intervals which 
are of higher importance. Moreover, the measure w is also useful to control the rescaled 
process. Indeed, in equation fl2.5p . the denominator goes to zero when s grows large and 
w can be precisely designed to avoid the practical problems caused by these too small 
denominators. Therefore, the finite sample behavior of our estimation procedure strongly 
relies on a wise choice of the measure w. 

In Section [231 we obtain an asymptotic representation of d{w) as a process indexed by 
w which holds uniformly in w eW where W is a set of measures in which the statistician 



plans to choose w. We then discuss in Section [3751 the adaptive choice of w. 

2.2.3 The semiparametric case 

In the semiparametric case, the family of functions /ig is unknown. However, the criterion 
used for the parametric case can be slightly modified to estimate Oq- We can write 

9o = SiTgmmM^{9,fie), 

6*60 

where 

M^{9,flg)= f " E[fxg{t,0'Zf]dw{t)-2 f " E[Y{t)iig{t,9'Z)]dw{t). 
Jo Jo 

Using a family of nonparametric estimators fig of fig, we define the estimator of 6o as 

9{w) = argmin Mn,U^, fig), (2.8) 



where 

M„.^(^, fie) =n-'y2 Mt, e'Zifdwit) - In'^ V / F,(t)/i,(t, d'Zi)dw{t). 

i=\ Jo i=i Jo 

In Section [3l4t we derive an asymptotic representation oi6{w) (see Theoreni l3.3p regardless 
of the type of nonparanietric estimators fig used in the computation and provided these 
nonparametric estimators satisfy a hst of uniform convergence conditions. Nevertheless, 
let us give a precise example of fig using kernel estimators. The convergence properties of 
this type of estimator is derived in Section 16. 2[ 

Using the same arguments as in f|2.4p . we have from the identifiability Assumptions [T] 
andEl 

We estimate the numerator in (12. 9p using a kernel estimator and the denominator by the 
Kaplan-Meier estimator G, leading to 

hAt.n)= / — r^ / ^ -, (2.10 

where K is a kernel function and h a bandwidth sequence tending to zero. In Section 



16. 2[ we list some conditions on K and h. How to choose the bandwidth from the data in 
practice is considered at the end of Section 13.71 

3 Asymptotic results 

In this section, we provide asymptotic properties for our estimators. In Section 13. Ij we 
first expose and briefly discuss a list of technical assumptions on the model and on the 
different elements needed for the estimation procedures. In Section 13. 2[ we expose our 
main lemma, wich is the keystone of our theoretical results. In the next two sections we 
give asymptotic representations of Q[w) for the parametric and semiparametric models. 
We then discuss the adaptive choice of the measure w in order to improve the performance 



of our procedure in Section 13751 The variance of the limiting process is estimated in Section 
13.61 and the choice of the bandwidth h in O2.10p is highlighted in Section 13.71 

3.1 Exposition and discussion of assumptions 

In order to obtain our asymptotics results, we first need to impose some conditions on 
different classes of functions. 
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Let us introduce some notations about the covering number. Let J-" be a class of functions 
with envelope F. Define, for a probability measure Q, the norm || ■ ||p.Q as the norm of 
U'{Q). The covering number of the class J-" for the measure Q denoted by N{e, J-", || ■ ||p,q) 
is the smaller number of ^^(Q)— balls of radius e needed to cover the set J-". The uniform 
covering number is defined as N{e,J^,\\ ■ ||p) = supg A^(£:||F||p_Q, J-", || • ||p,q) where the 
supremum is taken over all probability measures. In what follows, we say that a class of 
functions J-" is a || ■ ||p — VC— class of functions if there exists two positive constants 7 and 
c such that A^(e, J-", || • ||p) < ce~^ . 

A class of functions J-" is said to satisfy one of the following property if the corresponding 
condition holds. 

Property 1. For a class of functions F = {f : {t,z) G [0, r//] x Z \-^ f{t,z)} and for 
any t < th, define 

Fr = {f{tr),te[0,T]}, 

which is a set of functions defined on Z. Then, for any r < th, Ft is a VC -class of 
functions. 

Property 2. For a class of functions F = {f : {t,z) G [0,r//] x Z \-^ f{t,z)}, the 
family of functions defined by {{z,y) \-^ f^" y(t) f (t, z)dw(t) , f G F,w E W} is Glivenko 
Cantelli. 

In Section |6. 3. 3l in the appendix, we give a general type of sufficient conditions to fulfill 
these properties. It is easy to check that these technical assumptions are verified when 
the following conditions hold altogether: 

- J-" is a class of polynomial functions f{t, z) (with bounded coefficients), 

- dE[Y{t)] = g{t)dt for some polynomial function g{t), 

- the class of measures is of the form W = {w : dw{t) = WQ{t)dw{t)} where Wo{t) is 
a decreasing function (of order t''' for k sufficiently high or exponential) and where 
w belongs to a class of monotone positive uniformly bounded functions sufficiently 
small (for example, piecewise constant bounded functions with a finite number of 
jumps). 

Property 3. Let F = {fe : (t, z) G [0, th] x Z ^^ fe{t, z),6 E 0} be a family of functions 
indexed by 9. For any fg-^, fg^ G F and z E Z, we have 

rTH 

sup / Wfe^t.z)- fe^{t,z)\\dw{t) <c\\ei-e2\\, 
wew Jo 



where c is a positive constant. 

We now introduce the assumptions needed to derive the asymptotic normahty of 6 in the 
parametric and semiparametric models. 

Assumptions for the parametric model. 

In the estimation procedures, we consider integrated versions of the rescaled process with 
respect to a measure w belonging to a class of measures W. Detailed comments on this 
family and its role in the statistical procedure are discussed in Section 13.51 We need the 
following assumption for this class of measures. 

Assumption 3. Assume there exists some probability measure Wq and a positive constant 
Co such that, for any w G W, 



H 

-th 



dw{s) < CoWoit), 

where Wo(t) = f^" dwQ^s) can be written as 

Woit) = W^it)W2it) 
where Wi and W2 are two positive and non-increasing functions satisfying 
(1) /;- Wl{t){l - F{t-))-\l - G{t-))-'dG{t) < 00, 



/o 



(2) £" W2{t)E[dN%t)] < 



00, 



(3) lim^^,^ W2it) = 0. 

In particular, Assumption [2] holds when all the measures w have their support in- 
cluded in a common compact subspace strictly included in [0,r//]. On the other hand, 
since the function Wi controls 1 — G{s—) in Y{s) for s in the vicinity of the tail of the 
distribution. Assumption [3] also allows to consider measures w which are supported in the 
whole interval [0,th]. Taking Wi{t) = (1 - H{t-)y/^{l - G{t-)y for some e>0 would 
be sufficient to obtain (1). Moreover, in the case where th = 00, ii we suppose that, for 
Pi > 0, we have E[N*{t)] ~ /3it when t — )■ 00, we could take for example W2{t) = t~^^ for 
132 > I to fulfill (2) and (3). 

We also need the following Holder condition on the process N. This is a technical as- 
sumption used in the proof of our main lemma. 

10 



Assumption 4. Suppose there exists 7 > such that 

\N{t)-N{t')\ 



E 



sup 

t<T,t'<T 



< 00. 



\t-t'v 

Let Vg/iol-s, -2; ^1) (resp. Vg/io(s, z; 6*1)) denote the vector of partial derivatives (resp. 
the Hessian matrix) of ^o{s, z\ 6) with respect to all the components of 6 evaluated at 9i. 
The following assumption can be understood as a regularity assumption on the regression 
model. 

Assumption 5. Assume that, for all w G W, the matrix 

^w,p = Jq" E[VeiJ,o{t, Z,6o)'V0fio{t, Z,9oy]dw{t) is invertible. Moreover, assume that the 
classes of functions {fiQ{-,-;9),9 G 0}, {'V0fio{-,-]O),9 G 0} and {'Vl^o{-,-;9),9 G 0} 
satisfy PropertiesUl [H andl^ 



Additional assumptions for the semiparametric model. 

The following assumption is similar to Assumption |5l Here, Vg^g-^{s,z) (resp. 
V^/X6ii(s, z)) denotes the vector of partial derivatives (resp. the Hessian matrix) of /^^(s, 9'z) 
with respect to all the components of 9 evaluated at 9i. Note that the gradient vector 
Ve/i0i(s, z) does not only depend on 9'z but also depends on the whole vector z. We give 
an explicit expression of this gradient in Lemma 16.51 

Assumption 6. Assume that, for all w G W, the matrix 

^w,sp = Jq" E[Vgfig^{t, Z)'VgfigQ(t^ ^y](^'^i^) ^'^ invcrtiblc. Moreover, assume that the 

),9 e 0}, {Vgfigi-,-),9 G 0} and {V^/ie(-, ■), ^ ^ 0} satisfy 



classes of functions {fj,g{ 
PropertiesUl [H andl^ 



As announced, we need uniform convergence properties for the nonparametric estimators 
fie- 

Assumption 7. Define fig{t,u) = sup(/ie(t, -u), 1). 

(1) Assume that 

flg{t,9'z) - fig{t,9'z) 



sup 



sup 

t<T(„),6'ee,^e2 



Ug,{t,9',zY^+>^'^ 
Vefl9{t,z) -Vgflg(t,z) 



sup 

t<T(„)fie@,zez 



f,g^{t,9',z)^^+''^ 

Wlfieit.z) -Vli2e{t,z) 



fig,{t,9',z)^^+'- 
where Ai, A2 are such that Ai + A2 > 1. 



Op(l) 
Op(l) 
Op(l) 
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(2) Assume also that 



sup 



Mt,0'Qz) - fi9,,{t,9'Qz) 



sup 

t<T(^„),zez 




Op{en), 



Opie' 



where Sns'^ = Op{n ^/^). 
Assumption 8. Assume that 



TH 



sup / fieoit,0oz)'^^^^^^^Uw{t) < oo 
zez Jo 

where Ai, A2 were defined in Assumptionl^l 

The following assumption is essential to the empirical process theory used in our proofs. 
We assume that the nonparametric estimators and fig^ belong to some Donsker classes of 
functions. 

Assumption 9. Assume that there exists some Donsker classes of functions Q and % 
such that for all w eW 



{z,y) 



{fJ'eoit,9oz) - y{t))\/0Heoit, z)dw{t) G G, 



z I — > 



TH 



flg^^it, 9qZ)V gfLg^{t, z)dw{t) G Ti. 



Moreover, assume that, almost surely for n large enough. 



TH 



{z, y) I — y {f^eoit, O'qz) - y{t))Vefieo{t, z)dw{t) e Q, 



TH 



fig^it, 9'qz)V enoait, z)dw{t) e T-L. 



To give examples of such kind of classes, consider J-" and W as defined in the discussion 
following Property |2] and suppose, in addition, that the functions {t,u) — )■ Wo{t)f{t,u) 
for / G -7-" (/ is defined on M^ since 6qZ G M) are twice continuously differentiable with 
bounded derivatives up to order 2. It follows from the results of Section [6.3.31 and from 
the decomposition of the gradient vector V6i/U6)„(t, 2) obtained in Lemma [6.51 that we can 
consider U = G = T + zP where 7' = {{u,y) -^ Jo"{fi{t,u) - y{t))f2{t,u)dw{t),w G 

>V,/l,/2G^}. 



12 



3.2 The main lemma 

From a theoretical viewpoint, the main issue stands in studying the difference between 
Y and its estimated version. The following lemma provides an asymptotic representation 
for a class of empirical sums in which the process Y is involved. 

Such kind of asymptotic representations have become very valuable tools for inference 
in survival analysis, since they allow to transform a non i.i.d. problem on a quantity that 
can be easily studied using the central limit theorem. See e.g. Stute (I1995p . Van Keilegom 
& Akritas (I1999p . Sanchez Sellero et al. (I2UU5P or Lopez ( I2UU9P for some similar results 
in other frameworks. 

Lemma 3.1. Let T he a class of functions with hounded envelope F satisfying Property 
[I] and assume that Assumptions\^ and^ hold. Define, for any function f & T, 



Snif, w) = -f' r Y,{t)f{t, Z,)dw{t) 



and 



Sn{f,w) = - V / '^'Ut)f{t,Z,)dw{t). 



(1) Assume that sup^gyy £'[S'„(F,U7)] < oo. Then, for all f E J^, 

Sn{f,w)-Sn{f,w) = -y2 r I Vs~m,6,)E[f{t,Z)dfi{s\Z)]dw{t) + Rn{f,w), 

n .^1 Jo Jo 

where 

(l-5)/(r<t) /■* IiT>s)dGis) 



l-H{T-) io [l-H{s-)][l-G{s-)] 

and where 

sup \Rn{f,w)\ = op^n^^^"^). 
Moreover, if the measures w are all supported in [0, r] for some r < th, then 

sup \Rnif,w)\ =Op{n'^ log n). 

(2) If f denotes a family of nonparametric estimators of functions f ^ T satisfying 
sup/e^ 11/ - /lloo = op(l), then 

sup \Sn{f,w) - Sn{f,w)\ =Op{nr^^^). 

wew 
Moreover, if the measures w are all supported in [0, r] for some r < Th, then 

sup \Sn{f,w) - Sn{f,w)\ = Op{n~^\ogn). 
13 



The proof is postponed to Section I^TTl With the estimated rescaled process Y at hand, we 
can now propose M— estimation procedures to estimate the regression function in both 
the parametric and semiparametric cases. 

3.3 Asymptotic normality of 9 in the parametric case 

Let =^ denote the weak convergence. 

Theorem 3.2. Assume that 1(2. 1\) holds. Under Assumptions [I] to 0, the estimator in 
l\2.7\) admits the following asymptotic representation 



[w - 



+ ["[ Vs-{Ti, 6i)E[Vefio{t, Z; ^o)ciyUo(s, Z; eo)]dw{tyj \ + i?„(w), 
where sup^gyy |i?„(w)| = Op(n~^/^). As a consequence, for any w G W, 

where V^^p = S^^A^^pS^p and Ay^^p is the covariance matrix of each term of the i.i.d. 
sum in the asymptotic expansion. 

Proof. Write 

Mn,u,{0, fj,o) = -2S'„(/xo(-, ■; 0),w) + n'^ ^ / /xo(t, Zf, 9fdw{t). 






Then, use the asymptotic representation of Lemma 13.11 Uniform consistency of 9{w) 
follows from the uniform convergence of M„„,(0,/io) which is obtained from Properties 
[2] and |3] for the classes of functions {fiQ{-,-]9),9 G 0} and {Vg/iol") S ^)) ^ ^ ©} (see 
Assumption EJ. 

To obtain the uniform CLT property for 9{w), use a Taylor expansion of VgMn,w{9, /io) 
around 9o: 

VeM„,^(^,/io) = VeM„,^(0o,/Wo) + V^M„,^(^,/io)(^-^o), (3.1) 

for some 9 between 9 and 9o. The left-hand side of (13. ip is zero by definition of 9. Moreover, 
the matrix VgM„ „,(^, /iq) is almost surely invertible for n large enough under Assumption 
[S] since 9 (and consequently 9) tends to 9q almost surely. This leads to 

9-9o = -V>f„-^(^,/io)VeM„,^(^o,/^o). 
14 



Write 

V^M„,^(^,/io)=-2 






+fioit,Zi;e)Vliioit,Z,;e))dwit) 



RniO,w) 



where Rn{0,w) comes from the change in the integration bounds of [0,T(„)] by [0, r//] 
and can be seen to tend uniformly to zero from Lebesgue's dominated convergence since 
the term inside the integral is bounded. From Lemma 13.11 the almost sure convergence 
of 6 and the fact that {VgyUo(') 'j^)?^ ^ ©} satisfies Property [3] (see Assumption [S]) , we 
get that Sn(VlfJ'o{-,-',0),w) converges to J^^" E[Y{t)'Vlfioit, Z; 9Q)]dw{t) uniformly in w. 
The second part converges uniformly to its expectation over O as a consequence of the 
Glivenko-Cantelli property of classes of functions satisfying Property [21 This shows that 

sup\VlM-l{e, fxo) -^lM-\9o, fio)\ = op(l). 

w 

On the other hand, we write 



V,M„,^(^o,/Wo) = -2 



1 ."^ 1"^^ 
5'„(Ve/io(-, ■;0o),w) y^ / fio{t, Z; 9o)Vef^oit, Z; eo)dw{t) 



i=l 



+ -J2 [ " /^olt, Z; ^o) Ve/io(t, Z; 9o)dw{t). 



Using Lebesgue's dominated convergence theorem, the last term tends uniformly to zero 
at a n~^/^ rate. Finally, the asymptotic representation follows from Lemma [3.11 D 

3.4 Asymptotic normality of 9 in the semiparametric case 

Theorem 3.3. Assume that li2.S\) holds. Under Assumptions U\ to\^ and to 0, the 

estimator in /i2.8\) admits the following asymptotic representation 

eiw) -e, = S^ip I 1 Yl (/^"[^^^) ~ '"^°(^' (^oZ.)]'^ofieoit, Z,)dw{t) 

+ ["[ Vs4Ti,S,)E[Vef^eo{t,0'oZdfieo{s,0'oZ)]dw{ty\\ + R^{w), 
where sup^gyy \Rn{w)\ = Op{n^^^'^). As a consequence, for any w G W, 

^/^{e{w)-eo)^^{o,v^,sp), 
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where K«,sp = ^w]sp^w,sp^w]sp ^'^^ ^w,sp is the covariance matrix of each term of the i.i.d. 
sum in the asymptotic expansion. 

Proof. The consistency of the prehminary estimator can be proved in the same way as 



in the proof of Theorem 13. 2[ using now the second part of Lemma 13.11 and the uniform 
consistency of fig (Assumption [7j) . Asymptotic normahty comes from the fact that 

The fact that 

sup \\VlM-X{lfie) - V^M^i(0o,/^.o)ll = op{l) 

w 

can be shown in the same way as in the proof of Theorem 13.21 using now the second 



part of Lemma 13.11 The big issue consists of proving the asymptotic representation of 
VeMn,w{Oo,ft0o)- Write 

i)dw{t) 



1 " /•-'(n) 



VeMn,^n{Oo,fieo) = -2 

Using the second part of Lemma 13. H this can be rewritten as 

^~i Jo MA*, tyo^iJ 



1=1 



+ R4n{w) 

= VeM„,^(6'o, neo) + Rin{w) + R2n{w) + Rzn{w) + Rin{w), 

where R^niw) comes from Lemma [3.11 and the change in the integration bound of [0,T(„)] 
by [0,rj|/]. Using the same arguments as in the proof of Theorem 13. 2^ we deduce that 
sup^ ||/?4„(t(;)|| = Op(r2^^/^). Using the uniform convergence rates of fig^^ and of Vefie^, we 
get straightforwardly that sup^ ||i?3„(u7)|| = Op(n~^/^). Using the uniform convergence of 
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Ve/i6»o) '^G see that the term Rin can be decomposed into 

n 
Rln{w) = n'^ J^ {fw{Zi, Yi) - fn,w{Zi, Yi)) 
i=l 

where fw and fn,w both belong (almost surely for n large enough) to the class Q defined 
in Assumption [9] and with sup^ \\fw — fn.w\\oo -^ a.s. Therefore, using the asymp- 
totic equicontinuity of the Donsker class Q (see e.g. Section 2.1.2 in Van der Vaart & 



Wellner flTgg5|l ). this shows that 

sup||i?i„(w) - / {U{z,y) - fn,w{z,y))dPz,Y{z,y)\\ = Op(?2"^/^). 

w J 

Moreover, it is clear that j{fw{z,y) — fn,w{z,y))dPzx{z,y) = using the fact that 
Vefie^it, z) — Vef^eoiti z) is a function of z only and that E[^0^^(t, O^Zi) — Yi{t)\Zj\ = 0. 

The term R2n{w) can be handled in the same way using now the Donsker class l-L in 
Assumption [9l observing that fiB^it, O'qz) — figg(t, O'qz) is a function of O'qZ only and getting 
from Lemma [6.51 that E\^ g ^g^^{t ^ Z)\9'qZ] = 0. D 

3.5 Adaptive choice of w 



The representations of Theorems 13.21 and 13.31 hold uniformly in w G W. Therefore, the 
asymptotic normality of our estimators of the parameter remains valid if we replace w 
by a data-driven measure w that converges to a specific optimal measure Wq. We give 
some indications on a method to obtain such kind of data-driven measure adapted to our 
estimation problem. 

The empirical measure w will be defined as the minimizer of some criterion. Since it 
is generally impossible to perform minimization on the functional space W, we minimize 
over a growing subset >V„. The adaptive procedure we propose consists of first estimating 
the asymptotic covariance matrix V^^sp (or Ko,p in the parametric case) for any w G Wn. 
From the asymptotic variance estimators, we derive the estimation of the mean squared 
error ii^[||^(w) — 6*011^]. We then take w as the element of >V„ such that the estimated mean 
squared error is minimal over VV„. Then, our final estimator is 

e = e{w). 

The uniform convergence of the remainder term in the representations of Theorems 13.21 
and l3.3l provides the asymptotic normality of 9 in the case where T,^ -^ S^^ a.s. for some 
Wq G >V. 
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3.6 Estimation of the variance 



We show how to estimate the variance in the representation of Theorem 13.31 and we 
propose an estimator of the mean squared error of 6q. Denote by ^n,w the term between 
brackets in the representation of Theorem 13.31 so that 

where sup^^yy \Rn{w)\ = Op{n^^^^). The quantity ^n,w can be estimated in the following 

way 

1 " 

in,w = - V] i^iSi, Zi, Ti, Yi] w), 

where 

ii;{6,Z,T,Y;w) = [ '"' {Y{t) - fi^{t,e'Z))Vgfi^it, Z)dw{t) 
Jo 

+ / / r/,_(r,5)n-i V {Vefie{tJ'Zi)di^^{s,e'Z,))dw{t), 
Jo Jo i^i 

(1 - 6)IiT < t) r' I{T > s)dG{s) 

' l-HiT-) Jo {l-His-)){l-Gis-)) 

and H is the empirical estimator of H. Therefore, the quantity A^^sp can be estimated 

by 

A, 



n 



-VU'(5,Z,T,F;w)--V^(5,Z,T,r;w)) 

i=l \ i=l / 



where ®2 denotes the product of the matrix with its transpose. To consistently estimate 



-'w^spi 



^W,sp = - /. / ^ef^§{t, Zi)V0flg{t, Zi)'. 

^ i=l Jo 



A consistent estimator of Vw,sp can then be computed from V^^sp = ^w\p^w,sp^w\p- 
Finally, we take E^ = C,'n,w^wlsp^wlsp^n,w as the mean squared error estimate. 

3.7 Estimation of the nonparametric part 

In the semiparametric model, estimation of the finite dimensional parameter Oq is only 
the first step of the method. With our estimator 6 at hand, we wish to estimate the 
conditional mean function fi{t\z). Different strategies can be proposed to perform this 
estimation. For this final estimator, there is no theoretical need to use the same kind of 

18 



nonparametric estimator as in the computation of 9. Proposition 13.41 below states that, 
under some convergence assumptions for the nonparametric estimator used in this second 
step, the asymptotic behavior of the final semiparametric estimator of /i is identical to the 
asymptotic behavior of a purely nonparametric estimator in the case where 9q is exactly 
known. 

Proposition 3.4. Let O* he some neighborhood of 6q, and let T be a set on which 
supggQ, (g7-^g2 11^6* /^6»o(^) -2^)11 < ^^- ^^'^ Ae ^6 ^ family of nonparametric estimators of 
fig satisfying the assumption 

sup \\Vgfieit,z)-Vefie{t,z)\\=op{l). (3.2) 



Then, we have 



sup \fLg(t,9'z) - fLg,Xt,Ooz)\=Op{n ^/^) 
teT,zeZ 



Proof. This is a direct consequence of a Taylor expansion of fig around 6q . From Theorem 
13.31 we have 6 — 6q = Op{n~^/'^). Then, the boundedness of 'WejJie^-,{t^z) and the uniform 
convergence in assumption (13. 2p give the result. D 

In the kernel estimator example of equation f|2.10p . a crucial issue stands in the choice 
of the bandwidth which strongly influences the performance of the nonparametric esti- 
mation. A first method to define our final estimator of fig^ consists of using an arbitrary 
sequence of bandwidth h to compute 9, then of using cross-validation techniques to select 
a bandwidth h. The final estimator is finally set as fig j:^{t,6'z). However, it seems more 
appealing to us to define a procedure which can be seen as an extension of the adaptive 
choice of bandwidth proposed by Hardle et al. fll993p and Delecroix et al. f l2006p . An 
interesting feature of this technique is that it selects an adaptive bandwidth h and a 
direction 6 at the same time. Indeed, define 

(^, /i) =argminM„,^(6',/ie,/,). (3.3) 

6»Ge,/iG-H 



The uniform in bandwidth consistency of the kernel estimators we use (see Section 16. 2p 
ensures us that 9 has the same asymptotic properties as in Theorem 13.31 On the other 



hand. Proposition 13. 51 below shows that the adaptive bandwidth h is asymptotically equiv- 
alent to the bandwidth we could obtain using a classical cross-validation technique in the 
case where the parameter 6*0 is exactly known. 
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Proposition 3.5. For some positive constants a,b and a, let Ti = [an °',bn "] be a set 
of bandwidth satisfying AssumptionUU and let 

ho = aigmin Mn,u,{9o, fieo,h)- 
hen 

Under the assumptions of Theorem I J. 51 and provided that sup^g^^gj^ ^^^ \(lo^h{t,6'z) — 
fJ'e,h(t,9'z)\ = op{l), we have 

h/ho — > 1 a.s. 

Proof Define (f>{h/ho) = Mn,u,{0o,fi9o,h) and 0„(/i//io) = aigmmg^Q Mn,^{9, fie,h)- By 
definition of h^ and h we have argmin^gj^;,] 0(x) = 1 and h/ho = argmin^-gj^^] 0„(x). 
Now write 

2 " r'^H 

(pnix) = (f){x) 

n 



J2 / Y,{t){P'e,xhoit,0'Zi)-ixe,hoit,e'Zi))dwit) 
+ -y2 [ " {f^e,xh,{t,0' Z,f - fie,h,{t,e' ZiY)dw{t) + Mn,n,{0, fie. 



ho/ 



Using Lemma ISTTl and the uniform in bandwidth consistency of pie,h-, the second and third 
terms in the decomposition tend to zero uniformly in x. On the other hand, the last term 
does not depend on x. This shows that h/ho ~^ 1 a.s. D 

4 Simulations 

We present here some empirical evidence of the good behavior of our semiparametric 
estimation procedure for finite sample sizes. 

In our simulation study, we consider the case where, conditionally on Zj, the process A^* 
is an homogeneous Poisson process with intensity 6'qZj + 5, that is 

E[N*{t)\Z,] = {e'^Z, + h)t, ^ = l,...,n. 

We take 6*0 = (1,1.6,1.25,0.7)' and we consider 4-dimensional covariates Zi ~ (8>^W[1,2] 
for i = 1, ...,n. The variables Di for i = 1, ..., n are generated according to a Weibull dis- 
tribution with parameters (10, 1.09). The censoring distribution is selected to be Weibull 
with parameters (4, A). Taking A = 1.38 or A = 1 leads respectively to 30% or 50% of 
censoring and an average of 20 or 18 recurrents events per sample. In our results, we 
emphasize the impact of the two parameters involved in our semiparametric procedure, 
namely the bandwidth of the nonparametric kernel estimators and the measure w. 
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First, we consider the case of a fixed bandwidth and show how the adaptive choice of 
w can improve the estimation performance of the parameter Oq. The nonparametric esti- 
mators are kernel estimators computed using the Epanechnikov kernel and a bandwidth 
ho = 0.2. We consider a set of discrete measures supported onX= {0.1,0.2,. ..,1.2}. 
Hence, for any function /, the integral with respect to w reduces to a finite sum. Indeed, 
we have 

''f{t)dw{t) = J2fikM{k}). 

kex 

Moreover, we consider only a finite number of choices for the weights w{{k}), that is 

w{{k}) = l for A; = 0.1,..., 0.8 

w{{k}) e {0.25, 0.5, 0.75, 1} for k = 0.9, 1, 1.1, 1.2. 

The intuition is that our procedure should allocate smaller weights to large values of Tj 
since the behavior of the Kaplan-Meier estimator is known to be less effective in this 
part of the distribution (and contributes significantly to the variance). Our estimator 
6 = 0{w, ho) is then compared to the estimator 6 obtained for the measure Wq which puts 
mass 1 at every point of X. 

In the table below, we report our results over 100 simulations of samples of size 100 
for two different rates of censoring (p = 30% and p = 50%). Recalling that the first com- 
ponent of 6q is imposed to be one, we only have to estimate the three other components. 
For each estimator, the Mean Squared Error (MSE) i?(||^ — ^o|P) is decomposed into bias 
and variance. 



p = 30% 


Bias 


Variance 


MSE 


9 




/ -0.322 ^ 
-0.198 

\ -0-02 ) 






/ 0.452 0.111 0.041 ^ 

0.111 0.42 0.009 
^ 0.041 0.009 0.249 y 




1.264 


e 




( -0.129 ^ 

-0.162 
^ -0.042 y 






/ 0.2 0.062 0.047 ^ 

0.062 0.272 -0.004 
y 0.047 -0.004 0.168 y 




0.685 



We also computed the average weights of w for the last four points of X. For 30% 
of censoring, we have : E[w({0.9})] = 0.777, E[w({l})] = 0.652, E[w({l.l})] = 0.607 
and E[w{{l.2})] = 0.535 and for 50% of censoring, E[w{{0.9})] = 0.782, E[w{{l})] = 
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p = 50% 


Bias 


Variance 


MSE 


e 




/ -0.428 ^ 

-0.324 
^ -0.05 J 






/ 0.478 0.129 0.156 ^ 

0.129 0.386 0.034 
^ 0.156 0.034 0.335 y 




1.49 


9 




1 -0.276 ^ 

-0.287 
^ -0.096 J 






/ 0.242 0.035 0.033 ^ 

0.035 0.234 0.023 
^ 0.033 0.023 0.199 y 




0.843 



O.Q82,E[w{{l.l})] = 0.575 and E[w{{1.2})] = 0.487. Clearly, choosing the measure from 
the data improves both the bias and the variance of our estimator. Moreover the weights 
of w get smaller for large values of k, especially when the proportion of censored data 
is high. Consequently, the adaptive measure seems to have a significant impact on the 
quality of the estimation of 6q. 

Next, we show how the choice of the parameter h influences the quality of estimation. 
We consider the fixed measure Wq which puts the same weights 1 at each point. The 
bandwidth h is chosen adaptively in a regular grid of length 0.05 in the set [0.05, 0.3]. The 
performance of the resulting estimator presented below is compared with the estimator 6 
of the previous table and shows significant improvement of its MSE. 





Bias 


Variance 


MSE 


Lo,h P = 30% 




/ -0.19 ^ 
-0.155 

^ 0.084 y 






/ 0.216 0.08 -0.08 ^ 

0.08 0.351 -0.009 
^ -0.08 -0.009 0.174 ^ 




0.967 


LoM^ P = 50% 




/ -0.281 ^ 

-0.309 
[ -0.114 j 






^ 0.244 -0.056 -0.081 ^ 

-0.056 0.256 0.027 
^ -0.081 0.027 0.17 y 




1.126 



5 Conclusion 

We proposed a new procedure to estimate the conditional cumulative mean function of 
the recurrent event process. We considered both parametric and semiparametric models 
for the conditional cumulative mean function. Our semiparametric single-index model 
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can be seen as a generalization of both the Cox model and the accelerated failure time 
model. Moreover, a new feature of our procedure stands in the measure w involved in our 
estimators which is designed to prevent us from problems in the tail of the distribution 
due to the presence of censoring. Then, we proposed a data-driven method to choose this 
measure adaptively. Our criterion is based on the minimization of the mean squared error 
for the estimation of 9q but our procedure is flexible enough to allow the use of any other 
criteria more adapted to the context. For example, we could consider a criterion directly 
based on the error of the estimation of /i. 

In this work, we mainly focused on kernel estimators for estimating the nonparametric 
part of our model, providing methods to choose the smoothing parameters from the data. 
Nevertheless, all our results are still valid for a general class of nonparametric estimators 
and do only rely on convergence properties. Hence, other kinds of estimators may be used 
provided they satisfy these conditions. 
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6 Appendix 



6.1 Proof of Lemma 13.1 

Let 

^n'"'(/,^) = - V / Y,{t)f{Z,,t)dw{t). 
nttJo 

Write 



= ^r'"^(/,^)+i?„(/,^). 

Decompose / into its positive and negative parts denoted respectively by /"*" and /~. 

rp rp 

The expectations of the two resuhing sums Sn"^ (/'*', w) and Sn"^ {f~ ,w) go to zero faster 
than n~^/^ using Lebesgue's dominated convergence. This entails that 

sup \sI'''\f,w)-SM,w)\ = opin''/''). 

Let T < th and define Wr{t) = w{t)I(t < r). On [0,r], we use the asymptotic i.i.d. 
expansion of the Kaplan-Meier estimator G proposed by Gijbels & Veraverbeke (I199ip 



which can also be deduced from Stute fll995p : 

G{t) - G{t) 



^J2vtiT,,6,) + Rnit), 



1 - G(t) n 

where supj<^ l-^n(^)l = Op{n^^ logn) and 

;i - S)I{T < t) /•* /(T > s)dG{s) 



Vt{T,6) 



l-H{T-) J, {i-H{s-))[l-G{s-)y 



Moreover, recall that sup4<^ \G{t)-G{t)\ = Op{n-^/^) (see Gill ([MSD, Theorem 2.1) and 
that supi<^(l - G(t))(l - G{t))-^ = Op(l) (see Gill (DSSSD, Lemma 2.6). Then, write 

^ li Jo Jo l-G(s-) 
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Using the fact that T is an uniformly bounded class, that J dWr < Cq from Assumption [3] 

and that E[Ni{T)] < oo for all r, we deduce that supj,„ \R'^{f,Wr)\ = Op{n^^). The first 
term in Rn{f, Wt) can be rewritten as 

^E/ '"'/ Vs-{T„Sj)E[f{Z,t)dfl{s\Z)]dWr{t)+j f ^5^^^'*(Z,,iVi,T,-,5,) j dWr{t), 



where 



ij 



f,t 



:Z.N,,T,,6,) = l\s4T„S,) {4%^^ -E[fiZ,t)d^,is\Z 



Observe that, with probability tending to one, the upper bound T(„) in the integrals can 
be replaced by r < th- Let j^f'^J-" and t, t' G [0, r]. We have 

\i:f'\Z„N„Tj,6,) - i:^'^'' {Z„Ni,Tj,S,)\ < Cr(\\f - f'WooN.ir) 

Ni{t)-N,{f) 



+|t-t'rsup 



(6.1) 



where Cr < oo and 7 > 0. Let Tir denote the set of all functions ^Z^-^'* when / ranges J-" 
and t ranges [0,r]. It follows from (16.1 p and Assumption H] that Tir is a || ■ II2— VC-class 
of functions. From this, using the Glivenko-Cantelli property of Tir, 



and 



sup 

f,t<r 



sup 

f,t<r 



1 " 

-y2^f'\Zi,Ni,T,,S,) 

i=l 



Op{n-'] 



Op(n 



-l^ 



since this can be seen as the supremum of a second order degenerate [/—process indexed 
by Tir (see Sherman 01994p ). This leads to the i.i.d. representation for Sn{f,Wr) for any 
T < Th. 
Similarly, write 

Snif, Wr) = Sj"^ (/, Wr) + Rn{f - /, ^r) + ^n(/, ^r) 

and using the fact that supjgjr||/ — /||oo = op(l) and that supj<^ IG'(t) — G{t)\ = 
Op{n~^/'^)., we deduce that supj-^|i?„(/ — /, Wr)| = Op{n~''^/'^). The representation for 

Sn{f,Wr) follows. 



26 



Now, we make r tend to th- Let P^{f,w) = Sn{f,w) — Sn"\f,w) and P^{f,w) 

^ 'T' 

Sn{f,Wr) — Sn"\f,Wr)- Since the class J-" is uniformly bounded, we get 



1^ /•T'(„) rt 

:(Jr Jo (1-G(s-))(1-G(s-))' 
M' ^ /-^w Wo{s V t)\G{s-) - G{s-)\dNi{s) 



<-E 



j=i 



(1-G'(.-))(1-G(s-)) 



where the last inequality is obtained from Fubini's theorem and Assumption [31 From 
Theorem 1.2 in Gill (1983), Assumption |3] and the fact that supj<j' (1 — G{t—)){1 — 
G{t-)y^ = Op(l) (see again Gill, 1983), we get that 



n „T, 



^ i=l -^0 ^ 



(") W2is V r)diVi(s) 



where An = Op{n~^^'^). The result follows from Lemma EH 



6.2 Uniform convergence of the nonparametric estimators 

In this section, we show that the kernel estimator fie^h defined by (12.101) satisfies the 
convergence rates required by Assumption [71 Introduce the quantity 






e;ua-(^)(i-g(»-)) 



We first study the convergence rate of the difference between /ig /^ and /ig and their deriva- 
tives. Since no Kaplan-Meier functions are involved in this expression, we can use clas- 
sical results on uniform convergence of kernel estimators, mainly from Einmahl & Ma- 
son (120051) . 

Assumption 10. Assume that 

(1) K has a compact support, say [—1, 1], J^K{s)ds = 1 and sup^. |-ft'(x)| < oo, 

(2) K is a twice differentiable and two order kernel with derivatives of order 0, 1 and 2 
of hounded variation, 

(3) /C := {K[[x — ■)/h^ : h> 0,x E M°'} is a pointwise measurable class of functions, 

(4) heHnC. [an-'', fori""] with a,b>0 and a e (1/8, 1/5). 
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We also introduce a trimming function in order to prevent from denominator close 
to zero in the definition of fie^h- Indeed, to ensure uniform consistency of our estimator, 
we need to bound this denominator away from zero. We use the same methodology as 
in Delecroix et al. (l 2UU6p . Let fgiz denote the density of 9'qZ and define the "ideal" 
trimming function Jg^{6QZ,c) = I{0'qZ G Bq) where Bq = {u : fe'z{u) > c} for some 
constant c > 0. As in Delecroix et al. fl2006p (see also Lopez f l2009p ). we first assume 
that we know some set B on which in{{fQiz{9'z) : z & B,6 E 0} > c where c is a strictly 
positive constant. In a preliminary step, we can use this set B to compute the preliminary 
trimming Jb{z) = I{z G B). Using this trimming function and a deterministic sequence 
of bandwidth h^ satisfying (4) in Assumption [TOl we define a preliminary estimator On of 
^0 as 

9n{w) = argmm Mn,w{0,MJB{z). 

Given this preliminary consistent estimator of ^o, we use the following trimming JniO'^Z, c) = 
I{fe'^z{d'n^) > c) which appears to be asymptotically equivalent to J0q{0oZ,c) (see e.g. 
Lopez 02UU9P ). Then, our final estimator consists of 

9{w) = aYgmmMn,w{0,fi'e)Jn{OnZ,c), 
eGe„ 

where 0„ is a shrinking neighborhood of ^o accordingly to our preliminary estimator 6n- 

As announced, the next proposition gives the rates of convergence of fle^h and its deriva- 
tives. Since we need a convergence over ^ G 0, the trimming we need to use is Jg{6'Z, c) := 
I{fe'z{d'Z:) > c). But notice that Jq^{9'qZ,c) can be replaced by j0{9'Z,c/2) on shrinking 
neighborhoods of ^o- 

Proposition 6.1. Under Assumption lTU, for z such that Jg{6'z,c) > 0, we have 



nh 



sup 

t<T^„^,e,z,h V logn 



flg(t,9'z) - fie{t,6'z) 



nh^ 



t<T,„,,e^zM V log^ 



sup 



sup 



nh^ 



t<T^ri),d,z,h y logn 



Ue,{t,e',zY^+>^^ 



Vlfle{t,z) -Vlfi0it,z) 



fi0,{t,e',z)^^+'- 



Op (11 



Op(l) 



Op(l). 



(6.2) 
(6.3) 
(6.4) 



Proof. The proofs of (I6.2p -( l6^ are all similar. The most delicate term to handle, coming 
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from (16 ■4p . is 



1 " 



nh^j^^fieo{t,e'oz)^^+^^ 



Z, - z? ^^„ fd'Z, - Q'z 



h 



' dNijs) 
l-G{s- 



Consider the class of functions /C introduced in Assumption (TUl From Nolan & Pol- 
lard fll987p . it can easily be seen that, using a kernel K satisfying Assumption [T0| for 
some c' > and z/ > 0, we have N{e, /C, || • ||oo) < c'e^'^, < £ < 1. 

Then, concerning the uniformity with respect to 6, Lemma 22 (ii) of Nolan & Po- 
lard (I1987P shows that the family of functions < (Z, N) i — > A^' (t, z) > satisfies the as- 
sumptions of Proposition 1 in Einmahl & Mason (12005^ . 

Define 

"' dN{s) 



A',{t,z):=-E 



[z-z? ^^je'z-e'z 

-K 



A'e{t,z) 






E 



fieoit,e',z)^^+^-- 
{Z-zf 



h 



dN{s) 



1-^(5- 



e'z = u 



fe'z{u) 



[fie,{t,e'ozr^+^^ J, 1-G{s~) 

and apply Talagrand's inequality (see Talagrand ( I1994p . see also Einmahl & Mason (l2UU5p ) 
to obtain that 



sup 



|i„,,(t,z) - A^,,{t,z)\ = Op {n-'/'h-'/'{logny/') . 



For the bias term, classical kernel arguments (see for instance Bosq & Lecoutre fll997p ) 
show that 



sup \An,h{t,z) - AnjX^,z)\=0{h'^). 



D 



It remains to study jlo^h ~ jJ'B,h- The following lemma gives some precision on the 
difference between the Kaplan Meier weights of jle^h and the "ideal" weights involving the 
true function G in jle,h- 

Lemma 6.2. Let Ag(s) = (1 - G{s-))-^ , Ag(s) = (1 - G{s-))-^ and 

"* dG{s) 



Ccit) 



(1) We have 



{l-G{s-)){l-H{s-)Y 
sup 1^5^ = Op(l). 

<<T'(„) 1 - G{t) 
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(2) For alio < (3 <1 and e > 0, we have 
where sup^<j. -R„(s) = Op{n~^/'^). 



Proof. (1) This result is a consequence of Lemma 2.6 in Gill 

(2) For < /3 < 1 and e > 0, write 

1-G{s-) 
where Rg{s) = {G{s-) - G{s-)) (l - G'(s-))"\ Since J^" CG{s)'^-^'dCG{s) < oo 



apply Theorem 1 in Gill fll983p and use the first part of the current lemma to conclude 
the proof. 

D 

From the definition of our estimator, problems arise when studying jlg^h for t in the tail 
of the distribution. This is a common problem when studying Kaplan-Meier estimators 
but it can be circumvented by some moment conditions on the response and censoring 
distribution. For instance, in the classical censored framework, Stute fll995p used the 
function Cg to compensate the bad behavior of Kaplan Meier estimator in the tail of the 
distribution. The following assumption gives a similar moment condition but adapted to 
our recurrent event context. 



Assumption 11. Assume that, for some e > 0, 

CG{ty/^^^' 

sup — — -T— < CX3 

and 

J^{l-G{s-))E[N*{s)dN*{s)] ^ 

sup -^^-^ —;^ < OO, 

(l-G(t-))%„(t,M)2A2 

where Ai and A2 are defined in Assumptions^ 

Therefore, these conditions allow us to consider a process A^* and variables D and G 
that are supported on the whole interval [0, th]- However they will hold true only if there 
is enough information on the recurrent event process in the tails of the distribution. For 
further illustration take, for k = 1 and 2, fig {t, u) ~ Cfc(l— G'(t))~'^^ for t in a neighborhood 
of Th, m — )■ 00 and where Ci, C2, /3i are three positive constants. Take also, for C3 > and 
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(32 > 0, 1 — F(t) ~ €3(1 — G{t)Y'^ for t in a neighborhood of t^- Then it can be shown 
that these conditions are verified for example in the case /3i > 1, /32 = 1 and Ai = A2 = 1. 
The next proposition gives the convergence rate of fig^h — j^e,h- Notice that if w is 
supported on a compact interval, we only need this result on a compact subset of [0, T„] 
and in this case Assumption [TT] is automatically fulfilled. 

Proposition 6.3. Under AssumptionslTU and lTJ\ for z such that J0{6'z,c) > 0, we have 

fie{t,9'z)-fie{t,9'z) _^ ^^-7/2o\ /g g^, 

(6.6) 
(6.7) 



sup 



sup h 

i<T(n),9,z,h 

sup h'^ 



Vehi^^z) -V0fi0{t,z) 






Op {n-'/'') , 
Op (n-^/^°) , 
Op {n-'/'') . 



Proof. We only prove (16. 7p since (16. 5p and (16. 6p can be handled similarly. Let us consider 
the following term involving the second derivative of K 

\-,J2^Z,-zrK"( ^'^\^'' ] {^e,{tXzY^^'^f,z{e'z))-' [\ms) - A{s))dNd 



nh^ 



i=l 



h 



From Lemma 16. 2[ this term can be bounded by 

'e'z,-0'z' 

\ K" \ 

nh 



Op{n-^'^h- 



i=l 



fie,it,e',z)-^'^^'^^ / ~A{s)CGisf^'/'^'^dN,{.s 
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where the Op— rate does not depend on t, 6, z nor h. Now, consider the family of functions 
indexed by t, 6, z and h, 

'e'z-e'z' 



:z,N) 



K" 



h 



fig,{t,9',z)-('^^'-^ / A{s)CG{sy^'^''-'^dN{s) 



This family is Euclidian (see Nolan & Pollard (I1987P ) for an envelope 

-y/3{ 

sup ^ 

t,z 



A{t)C^^^^^^'\t)N{t) 



which is, for /3 = 7/10, square integrable from Assumption [TTl Then, using the results of 
Sherman (I1994p . the second part of (16. 8 P is Op(l) uniformly in t, 9, z and h. 

U 



Finally, combination of Propositions 16.11 and 16.31 leads to the following result. 
Corollary 6.4. Under Assumptions {TU and {TT\. for z such that Jg{6'z,c) > 0, 



sup 



\fte{t, e'z) - ^^e{t, 9' z)\ ■ \\VeMt, z) - Vel^e{t, z) 



|/.,„(t,^^z)|2(Ai+A.) 



op[n 



^1/2n 
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6.3 Technical lemmas 

6.3.1 Gradient vector in the single-index model 
Lemma 6.5. // the function 9 i— ?■ ^g{t\6'z) is diff'erentiable, we have 

VofioMZ) = f^'oSt%Z){Z - EiZ\9',Z)), 

where fi'Q^{t\u) = ^fi0^^(t\u) . As a consequence 

E[Vefi9o{t\Z)\e',Z] =0. (6.9) 

Proof. Observe that He{t\d' Z) = E[i2eoit\e'oZ)\e' Z] and let ({Z, 9) = 9'qZ - 9'Z for 9 e Q. 
We have 

fie{t\9'Z) = E [i2g, {t\C{Z, 9) + 9'Z) \9'Z] . 

Defining 1(^1,^2) = E[^eMC{Z,9i)^-9'^Z)\9'^Zl we have r(^,^) = fie{t\9'Z), which leads 
to 

V.,r(^i,^o)|,,=,„ = -^i'eSt%Z)E[Z%Z], 

n 

6.3.2 Auxiliary lemma for tightness conditions 

Lemma 6.6. Let T be a class of functions. Let Pn{t, /) be a process on [0, Th] x J^. 
Define, for any r G [0, Th], Rni'T-, /) = Pn{TH, f) — Pni^-, /)• Assume that for any r such 
that T < Th 

Pn{t, f) =^ W(V>(t)) G P([0, r]), / G ^, 

where W(Vj(t)) is a centered Gaussian process with covariance function Vf and T) denotes 
the set of cddldg functions. 

Assume that, for a sequence of random variables (X„) and two functions T and r„, the 
following conditions hold 

(1) lim^^^^ V/(r) = Vf{TH) with sup^^jr |V/(rj/)| < 00, 

(2) |i?„(r', /)| < X„ X r„(r) for all r < r' < Th, 

(3) X„ = Op(l), 
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(4) r„(r) — !■ r(r) in probability, 

(5) lim.^.^r(r) = 0. 

Then PniTH, f) =^ Ar(0, Vf{TH)). 

Proof. From Theorem 13.5 in Billingsley (11999P ) and condition (1), it suffices to show 
that, for all e > 



lim limsupP sup \Rn{tJ)\>e]=0. (6.10) 

Using condition (2) , the probability in equation (I6.10p is bounded, for all M > 0, by 

P(|r„(r) - r(r)| > e/M - T{r)) + P(X„ > M). (6.11) 

Moreover, from condition (4), we can state that 

limsupP(|r„(r)-r(r)| > e/M - r(r)) = I{e/M -V{t) > 0). 
Since r(r) — t- (condition (5)), we can deduce that 

lim limsupP(|r„(r) - r(r)| > e/M - r(r)) = 0. 
As a consequence. 



lim limsupP I sup |P„(t, /)| > e] < lim limsupP(X„ > M) = 0, 
using condition (3). D 

6.3.3 Covering number results 

In this section, we determine the covering numbers of some particular classes of functions. 
From these computations, we can easily deduce sufficient conditions to check Property [2] 
and Assumption O 

Proposition 6.7. Let T be a class of functions f{t, z) with envelope F defined onM.^W^ 
with continuous derivative with respect to the first component. Let F be the enveloppe of 
the class of functions df{s,z)/ds. Let Wo{t) be a positive bounded decreasing function 
and set W = {w : dw{t) = Wo(t)dw{t),w G W} where W is a class of monotone positive 
functions with envelope function W . 
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Assume that E[{£" F{t,z)Wo{t)dY{t)Y] < oo, E[{£" F{t,z)Y{t)dWo{t))^] < oo and 
E[{£" F{t,z)Wo{t)Y{t)dt)'] < oo. 

Then, the class of functions "H = {{z,y) — ?■ J^" f{t,z)y{t)dw{t), f & J^,w E W} has a 
uniform covering number satisfying, for some constant C 

N{e, n, II ■ II2) < CN{e, W^T , \\ ■ \\^)N{e, W, || -lb). 

Proof. Let Q be a probability measure and introduce 

Ni = supg N{e\\WoF\\Q, WoT, \\ ■ hq) and N2 = supg N{e\\W\\Q, W, || ■ ||2,q). Let 

{fi °}i<j<Afi (respectively {wj}i<j<N2) be the center of the e — || ■ ||2,q balls needed to 

cover WqJ^ (respectively W). Writing dw{t) = Wo{t)dw{t), we have for any 1 < i < Ni 

and 1 < j < A^2 



< 



Yit)fit,z)Woit)dwit)- / Yit)f^''{t,z)dw,it) 

Jo 

Y{t){f{t,z)Wo{t) - fr'{t,z))dw,{t) 



Y it) fit, z)Wo{t){dw - dwj){t) 




For any f E T, there exists some i such that the first term is seen to be less than Cie in 
L^{Q)— norm. For the second term, there also exists some j such that this is smaller than 
€26, which can be seen using integration by parts. The result follows. D 
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